Personal Name Disambiguation in Web Search Results Based on a Semi-supervised Clustering Approach

نویسندگان

  • Kazunari Sugiyama
  • Manabu Okumura
چکیده

Most of the previous works that disambiguate personal names in Web search results often employ agglomerative clustering approaches. In contrast, we have adopted a semi-supervised clustering approach in order to guide the clustering more appropriately. Our proposed semi-supervised clustering approach is novel in that it controls the fluctuation of the centroid of a cluster, and achieved a purity of 0.72 and inverse purity of 0.81, and their harmonic mean F was 0.76.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determine the Entity Number in Hierarchical Clustering for Web Personal Name Disambiguation

An internet user is often frustrated by the ambiguous names in the web search results when the user is trying to find information about some person. Hierarchical clustering methods are often used to cluster the personal names referred to the same entities. As the correct number of the entities for a given personal name can not be accessed, we are required to determine the cut points in the dend...

متن کامل

TITPI: Web People Search Task Using Semi-Supervised Clustering Approach

Most of the previous works that disambiguate personal names in Web search results employ agglomerative clustering approaches. However, these approaches tend to generate clusters that contain a single element depending on a certain criterion of merging similar clusters. In contrast to such previous works, we have adopted a semisupervised clustering approach to integrate similar documents into a ...

متن کامل

Clustering web people search results using fuzzy ants

Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. T...

متن کامل

AUG: A combined classification and clustering approach for web people disambiguation

This paper presents a combined supervised and unsupervised approach for multidocument person name disambiguation. Based on feature vectors reflecting pairwise comparisons between web pages, a classification algorithm provides linking information about document pairs, which leads to initial clusters. In addition, two different clustering algorithms are fed with matrices of weighted keywords. In ...

متن کامل

Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation

We propose a supervised word sense disambiguation (WSD) system that uses features obtained from clustering results of word instances. Our approach is novel in that we employ semi-supervised clustering that controls the fluctuation of the centroid of a cluster, and we select seed instances by considering the frequency distribution of word senses and exclude outliers when we introduce “must-link”...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007